Goto

Collaborating Authors

 lemma 16





On the $h$-majority dynamics with many opinions

arXiv.org Artificial Intelligence

We present the first upper bound on the convergence time to consensus of the well-known $h$-majority dynamics with $k$ opinions, in the synchronous setting, for $h$ and $k$ that are both non-constant values. We suppose that, at the beginning of the process, there is some initial additive bias towards some plurality opinion, that is, there is an opinion that is supported by $x$ nodes while any other opinion is supported by strictly fewer nodes. We prove that, with high probability, if the bias is $ฯ‰(\sqrt{x})$ and the initial plurality opinion is supported by at least $x = ฯ‰(\log n)$ nodes, then the process converges to plurality consensus in $O(\log n)$ rounds whenever $h = ฯ‰(n \log n / x)$. A main corollary is the following: if $k = o(n / \log n)$ and the process starts from an almost-balanced configuration with an initial bias of magnitude $ฯ‰(\sqrt{n/k})$ towards the initial plurality opinion, then any function $h = ฯ‰(k \log n)$ suffices to guarantee convergence to consensus in $O(\log n)$ rounds, with high probability. Our upper bound shows that the lower bound of $ฮฉ(k / h^2)$ rounds to reach consensus given by Becchetti et al. (2017) cannot be pushed further than $\widetildeฮฉ(k / h)$. Moreover, the bias we require is asymptotically smaller than the $ฮฉ(\sqrt{n\log n})$ bias that guarantees plurality consensus in the $3$-majority dynamics: in our case, the required bias is at most any (arbitrarily small) function in $ฯ‰(\sqrt{x})$ for any value of $k \ge 2$.


A Regression Approach to Learning Augmented Online Algorithms (Supplementary)

Neural Information Processing Systems

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? Did you include complete proofs of all theoretical results? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL?[N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) In this section, we prove Theorems 6 and 12 which give upper bounds on the sample complexity in the standard and agnostic settings respectively. The following is a well-known result that relates covering numbers to the pseudo dimension (cf. A.1 The Standard Model: Proof of Theorem 6 First, we relate covering numbers to this error measure.


Minimum intrinsic dimension scaling for entropic optimal transport

arXiv.org Artificial Intelligence

Motivated by the manifold hypothesis, which states that data with a high extrinsic dimension may yet have a low intrinsic dimension, we develop refined statistical bounds for entropic optimal transport that are sensitive to the intrinsic dimension of the data. Our bounds involve a robust notion of intrinsic dimension, measured at only a single distance scale depending on the regularization parameter, and show that it is only the minimum of these single-scale intrinsic dimensions which governs the rate of convergence. We call this the Minimum Intrinsic Dimension scaling (MID scaling) phenomenon, and establish MID scaling with no assumptions on the data distributions so long as the cost is bounded and Lipschitz, and for various entropic optimal transport quantities beyond just values, with stronger analogs when one distribution is supported on a manifold. Our results significantly advance the theoretical state of the art by showing that MID scaling is a generic phenomenon, and provide the first rigorous interpretation of the statistical effect of entropic regularization as a distance scale.


Beyond L1: Faster and Better Sparse Models with skglm

arXiv.org Artificial Intelligence

We propose a new fast algorithm to estimate any sparse generalized linear model with convex or non-convex separable penalties. Our algorithm is able to solve problems with millions of samples and features in seconds, by relying on coordinate descent, working sets and Anderson acceleration. It handles previously unaddressed models, and is extensively shown to improve state-of-art algorithms. We release skglm, a flexible, scikit-learn compatible package, which easily handles customized datafits and penalties.


Pure Exploration in Multi-armed Bandits with Graph Side Information

arXiv.org Machine Learning

The multi-armed bandit has emerged as an important paradigm for modeling sequential decision making and learning under uncertainty with multiple practical applications such as design policies for sequential experiments [30], combinatorial online leaning tasks [6], collaborative learning on social media networks [21, 2], latency reduction in cloud systems [18] and many others [5, 41, 36]. In the traditional multi-armed bandit problem, the goal of the agent is to sequentially choose among a set of actions (or arms) to maximize a desired performance criterion (or reward). This objective demands a delicate tradeoff between exploration (of new arms) and exploitation (of promising arms). An important variation of the reward maximization problem is the identification of arms with the highest (or near-highest) expected reward. This best arm identification [28, 8] problem, which is one of pure exploration, has a wide range of important applications like identifying molecules and drugs to treat infectious diseases like COVID-19, finding relevant users to run targeted ad campaigns, hyperparameter optimization in neural networks and recommendation systems. The broad range of applications of this paradigm is unsurprising given its ability to essentially model any optimization problem of black-box functions on discrete (or discretizable) domains with noisy observations. While the bandit pure exploration problems harbor considerable promise, there is a significant catch. In modern applications, one is often faced with a tremendously large number of options (sometimes in the millions) that need to be considered rapidly before making a decision. Pulling each bandit arm even once could be intractable.


Adaptive transfer learning

arXiv.org Machine Learning

In transfer learning, we wish to make inference about a target population when we have access to data both from the distribution itself, and from a different but related source distribution. We introduce a flexible framework for transfer learning in the context of binary classification, allowing for covariate-dependent relationships between the source and target distributions that are not required to preserve the Bayes decision boundary. Our main contributions are to derive the minimax optimal rates of convergence (up to poly-logarithmic factors) in this problem, and show that the optimal rate can be achieved by an algorithm that adapts to key aspects of the unknown transfer relationship, as well as the smoothness and tail parameters of our distributional classes. This optimal rate turns out to have several regimes, depending on the interplay between the relative sample sizes and the strength of the transfer relationship, and our algorithm achieves optimality by careful, decision tree-based calibration of local nearest-neighbour procedures.


Consistency of random-walk based network embedding algorithms

arXiv.org Machine Learning

Random-walk based network embedding algorithms like node2vec and DeepWalk are widely used to obtain Euclidean representation of the nodes in a network prior to performing down-stream network inference tasks. Nevertheless, despite their impressive empirical performance, there is a lack of theoretical results explaining their behavior. In this paper we studied the node2vec and DeepWalk algorithms through the perspective of matrix factorization. We analyze these algorithms in the setting of community detection for stochastic blockmodel graphs; in particular we established large-sample error bounds and prove consistent community recovery of node2vec/DeepWalk embedding followed by k-means clustering. Our theoretical results indicate a subtle interplay between the sparsity of the observed networks, the window sizes of the random walks, and the convergence rates of the node2vec/DeepWalk embedding toward the embedding of the true but unknown edge probabilities matrix. More specifically, as the network becomes sparser, our results suggest using larger window sizes, or equivalently, taking longer random walks, in order to attain better convergence rate for the resulting embeddings. The paper includes numerical experiments corroborating these observations.